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Abstract. The games of prediction with expert advice are considered 
in this paper. We present some modification of Kalai and Vempala al- 
gorithm of following the perturbed leader for the case of unrestrictedly 
large one-step gains. We show that in general case the cumulative gain 
of any probabilistic prediction algorithm can be much worse than the 
gain of some expert of the pool. Nevertheless, we give the lower bound 
for this cumulative gain in general case and construct a universal algo- 
rithm which has the optimal performance; we also prove that in case 
when one-step gains of experts of the pool have "limited deviations" the 
performance of our algorithm is close to the performance of the best 
expert. 

1 Introduction 

Experts algorithms are used for online prediction or repeated decision making or 
repeated game playing. Any such algorithm is based on a "pool of experts" . At 
any step t, each expert gives its recommendation. From this, a "master decision" 
is performed. After that, losses (or rewards) s\ are assigned to each expert i = 
1, . . . , m by the environment (or adversary). The master algorithm also receives 
some loss or reward depending on the master decision. The goal of the master 
algorithm is to perform almost as well as the best expert in hindsight in the long 
run. 

Prediction with Expert Advice considered in this paper proceeds as follows. 
We are asked to perform sequential actions at times t = 1,2, ... ,T. At each 
time step t, we observe results of actions of experts in the form of their gains 
and losses on steps < t. After that, at the beginning of the step t Learner makes 
a decision to follow one of these experts, say Expert i. At the end of step t 
Learner receives the same gain or loss as Expert i at step t. 

We use notations and definitions from [5] and [7]. Let s\. t = s\ + . . . + s\ 
be the cumulative loss of Expert i at time t. Given s\. t _ 1 , i = l,...,m, at 
time t, a natural idea to solve the expert problem is "to follow the leader" , i.e. 
to select the expert i which performed best in the past. The following simple 
example from Kalai and Vempala [7] shows that Learner can perform much worse 
than each expert: let the current losses of two experts on steps t = 1, ... ,6 be 



s i,2,3,4,5,6 = (°. !> °. !. °. !) and s i,2,3,4,5,6 = (i °> ^ °> ^ °)- The "Follow Leader" 
algorithm always chooses the wrong prediction. 

The method of following the perturbed leader was discovered by Hannan [4] . 
Kalai and Vempala [7] rediscovered this method and published a simple proof 
of the main result of Hannan. They called the algorithm of this type FPL (Fol- 
lowing the Perturbed Leader) . Hutter and Poland [5] presented a further devel- 
opments of the FPL algorithm for countable class of experts, arbitrary weights 
and adaptive learning rate. 

The FPL algorithm outputs prediction of an expert i which minimizes 

s l:i-l ~ ~£t> 

where Q, i — 1, ... to, t = 1,2,..., is a sequence of i.i.d random variables dis- 
tributed according to the exponential distribution with the density p(t) — e - *, 
and e is a learning rate. Kalai and Vempala [7] show that the expected cumulative 
loss of the FPL algorithm has the upper bound 

E(s 1:t )<(l + e) min 4* + —, 

i— l,...,n e 

where e is a learning rate, n is the number of experts.. 

In the papers cited above the loss of each expert i can change at any step 
t by a bounded quantity, for example, < s\ < 1 for all t. Poland and Hut- 
ter [6] extended this analysis for games with one-step losses upper bounded by 
an increasing sequence B t given in advance, i.e., s t < B t for all t. Allenberg et 
al. [1] also considered unbounded losses, but with different algorithm than in 
this paper. 

In games considered in this paper the players will incur gains (loss is a neg- 
ative gain); s\ denotes one-step gain of a player i. For practical purposes, the 
property < s\ < 1 seems to be too restrictive. 

In Appendix A we consider some applications of results of Sections 2-4 of 
this paper. We define two financial experts learning the fractional Brownian 
motion whose one-step gains at any step can not be restricted in advance. This 
application is at the bottom of our special interest in zero-sum games with 
unbounded gains in Section 4. 

In this paper we present some modification of Kalai and Vempala algorithm 
for the case of unrestrictedly large one-step gains not bounded in advance. We 
show that in general case, the cumulative gain of any probabilistic prediction 
algorithm can be much worse than the gain of some expert of the pool. Neverthe- 
less, we give the lower bound for cumulative gain of any probabilistic algorithm 
in general case and prove that our universal algorithm has optimal performance; 
we also prove that in case when one-step gains of experts of the pool have "lim- 
ited deviations" (in particular, when they are bounded) the performance of our 
algorithm is close to the performance of the best expert. This result is some 
improvement of results mentioned above. 



2 Learning in games of two experts with unbounded 
gains 



In this section we give some preliminary results presenting the bounds on the 
performance of the algorithm constructed in Section 3. 

We consider a simple game G of prediction with expert advice by following 
of two experts with unbounded one-step gains. The goal of the master algorithm 
is to receive a cumulative gain not much worse than the gain of the best expert 
in hindsight. 

At each step t of the game both experts receive the nonnegative one-step gains 
s\ and sf , and their cumulative gains after step t are equal to s\. t = s\. t _ 1 + sj 
and sj. t = + s 2 t . 

For simplicity, we consider a variant when at each step t of the game G 
only one expert can receive a nonnegative one-step gain st, and the total gain 
of the other expert is unchanged, i.e., s\. t = s\. t _ 1 + s t and s\. t — s\. t _ 1 or 
s i-.t = s i:t-i + s t an d s\. t — s\. t _ l . In the general case the analysis is similar. 

We also consider non- degenerate experts (games), i.e., such that 
max{s}. ( , — > oo as t — > oo. 

A probabilistic algorithm of following the leader in the game with two experts 
is based on a computable function / which given cumulative gains s{. t _ 1 and 
Si-t_i of the experts in hindside outputs the probability of following the first 
expert P{I = 1} = f(s{. t _ 1 , sl. t _ 1 ) and the probability of following the second 
expert P{I = 2} = 1 - P{I = 1}. 

The analysis in case when these probabilities depend of the whole history of 
gains is similar. 

Let two experts be given. The master algorithm works as follows. 
Probabilistic algorithm of following the leader. 
FOR t = 1,...T 

Given past cumulative gains of the experts s\. t _ 1 and Si ;t _i choose the expert 
i G {1, 2} with probability P{I = i}. 

Receive the one-step gains at step t of two experts and sf and define one 
step gain s t = si of the master algorithm. 
ENDFOR 

The following theorem says that if a probabilistic algorithm of following the 
leader has high performance in games with bounded one-step gains then its 
performance in games with unbounded one-step gains can be much worse than 
the performance of some experts. 

Theorem 1. Let 5,5' be arbitrary close and arbitrary small positive real num- 
bers such that 5' > 5, and let for any two non- degenerate experts with bounded 
one-step gains s\, i = 1, 2, i.e., such that < s\ < 1 for all t, a master algorithm 
has the expected cumulative gain 



E(s 1:t ) > (l-5)maxs' 1:t 



(1) 



for all sufficiently large t. Then there exist two experts with unbounded one-step 
gains such that the expected cumulative gain of the master algorithm is bounded 
from above 

E(s 1:t ) < S'maxs\. t (2) 

for infinitely many t. 

Proof. Let a master algorithm be given, and let P{I = 1} = f(s 1 ,s 2 ) and 
P{I = 2} = 1 — P{I = 1} be probabilities to choose the best expert from two 
experts with cumulative gains s 1 , s 2 . The proof of the theorem uses the following 
lemma. 

Lemma 1. Let 6,6' be positive real numbers such that 6' > 6 and for any two 
experts with bounded one-step gains the master algorithm has the expected per- 
formance (1) for all sufficiently large t. Then for any two real numbers s 1 and 
s 2 a number s 1 exists such that s 1 > s 1 , s 1 > s 2 , and P{I = 1} > 1 — 6' , where 
P{I = 1} = f(s 1 , s 2 ) (and P{I = 2} = 1 - P{I =1}). 

Proof. Suppose that for some pair s 1 , s 2 of real numbers the contrary statement 
holds. Then we can construct two experts with cumulative gains s\. t _ 1 ,s 2 . t _ 1 , 
t = 1,2,..., and with step-gains equal or 1 such that (1) is violated. 

Define the sequences , s 2 , t = 1, 2, . . . to, such that , s 2 are equal to or 
1 and such that s{. to = s 1 and s\. ta = s 2 for some to. After that, define sj = 1 
and s 2 = for all t > t . We have s\. t _ 1 > s\. t _ 1 and P{I = 1} < 1 - 6' for all 
sufficiently large t. Then for the expected one-step gain of the master algorithm, 

E(s t ) < 1 o P{I = 1} + o P{I = 2} < 1 - 6' = s](l - 5') 

holds for all these t. Since s\. t —> oo as t — > oo, we have E(si :t ) < (1 — 8)s\. t 
for all sufficiently large t. This is a contradiction with (1). Hence, for some t we 
have s\. t ^ > s 2 ,^ and P{I = 1} > 1 - 5\ where P{I = 1} = f {s\. t ^ s 2 .^) . 
A 

We define two experts with unbounded one-step gains as follows. Define = 

and Sq = 0. By Lemma 1 a number s 1 > exists such that P{I = 1} > 1 — 5, 
where P{I = 1} = /(s\0). Define s\ = s 1 , s\ = 0. 

Let t be even, and let s\. t _ 1 and s 2 . t _ 1 be defined on previous steps. We 
will use the induction hypothesis: s{. t _ 1 > s 2 . t _ 1 and P{I = 1} > 1 — S. By 
definition this induction hypothesis holds t = 2. Define one-step gains of experts 

1 and 2 at step t: si — and s\ — M t , where M t = E & Yf_~ b and E(s\. t -\) is 
the mathematical expectation of the cumulative gain of the master algorithm on 
steps < t. 

Let t be odd. By Lemma 1 a number s 1 exists such that s 1 > s\. t _ 1 , s 1 > 
s? :t _ l5 and P{I = 1} > 1 - S, where P{I = 1} = /(sSs^). Define sj = 
s 1 — s\. t _ 1 and s\ — 0. Then s\. t = s 1 and s\. t = s\. t _ 1 . Evidently, the induction 
hypothesis is valid after step t. 

Let us prove that this definition is correct. Let t be even. By the induc- 
tion hypothesis s\. t _ l > s\. t _ 1 and P{I = 1} > 1 — 6, where P{I = 1} = 



/(s} :t _i,sL-i)- Then P{I = 2} < 5. By definition s\. t = s} :t _ 1 and s\. t = 
s\. t _ 1 + M t . Then we obtain an upper bound for the expected one-step gain of 
the master algorithm 

E{s t ) = s\E{I = 1} + s 2 t E{I = 2} < SM t . 

For expected cumulative gain, we have 

E(a 1:t ) < B(ai:t_i) + SM t < 5'(«i :t -i + M *) = ^L- (3) 

Inequality (3) holds for all even steps t. A 

Decreasing the lower bound of the performance of a probabilistic algorithm 
for games with bounded one-step gain functions we can increase it for games 
with unbounded gain functions. The limit case is given by the following simple 
example. Evidently, the expected cumulative gain of the probabilistic algorithm 
which chooses one of two experts with equal probabilities \ has the lower bound 

E(si-.t) = \s\. t + ^sl t > l - max s\, t (4) 

for i = 1,2. 

The following simple diagonal argument shows that the cumulative gain of 
any probabilistic algorithm of following the leader can be bigger than this bound 
for some experts, analogously, it can be smaller for some experts. 

Proposition 1. For any S such that < S < 1 and for any probabilistic al- 
gorithm of following the best expert, two experts exist such that the expected 
cumulative gain of this algorithm satisfies 

E(s 1:t )<l(l + S)m n s\ :t , (5) 

Z 1—1,2 

for all sufficiently large t, where s\. t , s\. t are cumulative gains of these experts. 
Analogously, two experts exist such that 

E(s 1:t )>hl-S)m^ S \ :t (6) 

Z 1—1.2 

for all sufficiently large t. 

Proof. Given a probabilistic algorithm of following the best expert and S such 
that < 5 < 1 define recursively the gains of expert 1 and expert 2 at any step 
t as follows. Let s\. t _ l and s\. t _ l be cumulative gains of these experts incurred 
at steps < t. Let M t — E(si-.t-i)/5, where E(si-.t-i) is the expected cumulative 
gain of the master algorithm in the past. 

If P{I = 1} > \ then define s\ = and s\ = M t , and define sj = M t 
and s\ = otherwise. Then E(s t ) = s\P{I = 1} + s 2 t P{I = 2} < \M t and 
E(si :t ) = £(s 1:t _!) +E(s t ) < \{l + 8)M t < |(l + 5)maxsi :t for all sufficiently 

large t. 

To prove (6) define s] = M t and sf = if P{I = 1} > ±, and define s] = 
and s 2 — M t otherwise. The following derivation is analogous to the proof of 
(5). A 



3 Asymptotically optimal algorithm of following the 
perturbed leader 



In this section we show that the bounds (1) and (2) obtained in Theorem 1 can 
be achieved by some probabilistic algorithm. More correctly, for any <5 > using 
the method of following the perturbed leader we construct a universal algorithm 
such that for any S such that < S < 1 the lower bound 

E(si :t ) > (1 - 8) maxs'n 

i— 1,2 

is valid for all sufficiently large t for arbitrary two experts (i = 1, 2) with bounded 
one-step gain functions (and even in more general case), and, at the same time, 
for some 5' > the bound 

E(si-t) > 5' max Si., 

i=l,2 

is valid for all experts with arbitrary unbounded one-step gain functions. Here 
E{s\ : t) is the cumulative expected gain of the master algorithm. 

Note that in this section the cumulative gain is always nonnegative s\. t > 
for all t and for i = 1,2. In Section 4 we consider the case when the gains can 
be negative, i.e., experts can incur losses. Recall that, for simplicity, we suppose 
that at any step t only one expert can receive a positive one-step gain, i.e., s\ = 
or si = 0. We denote St = max{sj , sj }. 

Let £l, £i: £2: £2: ■■■ be a sequence of i.i.d. random variables distributed ac- 
cording to the exponential law with the density p(t) = e~ l . 

We consider the FPL algorithm with learning rate 

where t = 1, 2, . . . and /x, where < jj, < 1, is a parameter of the algorithm. 

We suppose without loss of generality that Sq = Sq = 1. By definition the 
sequence ei, 62, . . . is non-decreasing. 

The FPL algorithm is defined as follows: 

FPL algorithm. 
FOR t = 1....T 

Output prediction of expert i — i ma x which maximizes 

4t-i + —& (8) 
e*-i 

where i — 1,2, e t -i is defined by (7). 

Receive one-step gains s\ for experts i = 1,2, and define one step gain s\ max 
of the master algorithm. 
ENDFOR 



Recall that a game G of two experts is called non-degenerate if v t = max{s}. t , sf 
oo as t — > oo, where s\. t is the cumulative gain of the expert i = 1, 2 at step t. 
The number 

Dev(G) = limsup — , (9) 

t^oo V t 

where St = max{s(,Sj}, is called the deviation of the game G. For any game 
Dcv(G) < 1 by definition. In any non-degenerate game G with bounded one- 
step gain function, i.e. such that < s t < A for all t (A is a positive real 
number), Dev(G) = 0. 

Theorem 2. For any [i such that < [i < 1 an FPL algorithm can be specified 
such that for any non- degenerate game of two experts its expected cumulative 
gain at any step T has the lower bound 

h-.T > e v- (1 — fi) m&xs\. T , (10) 

i— 1,2 

where s\. T is the cumulative gain of the expert i — 1,2. 
If Dcv(G) < \n5 for some < S < 1 then 

h-.T > (1 -6)(1 -/j)maxsj. T (11) 

i— 1,2 

ZioZds /or a// sufficiently large T. 1 

Proof. This theorem will follow from Theorem 3 and Corollary 2 below. In the 
proof we follow the proof-scheme of [5] and [7] . A 

The analysis of optimality of the FPL algorithm is based on an intermediate 
predictor IFPL (Infeasible FPL) with the learning rate 

ct = — , (12) 

where v t = max{s}. f ,Sj :t }. 
IFPL algorithm. 

FOR t=l,...T 

Output prediction of expert i — i max with maximal value of 

where i = 1, 2, e t is defined by (12), and ^, ^ are independent random variables 
distributed according to the exponential distribution with the density p{t) — e~* . 



1 The optimal value of fi for (10) is = 0.618. Then h-r > 0.015 maxi = i,2 s\. T in (10) 
and h-T > 0.382(1 — S) max i= i,2 s\. T in (11). Comparing these bounds with (4), we 
reveal a large gap between bounds (10) and (11). Author does not know if we can 
increase the lower bound (10) when p w | in (11). 



Receive one-step gains s\ for experts i = 1,2, and define one step gain s* max 
of the master algorithm. 
ENDFOR 

The IFPL algorithm predicts under the knowledge of s\. t and s\. t (e t is 
their maximum), which both may not be available at beginning of step t. Using 
unknown value of e t (like s\. t , i = 1,2) is the main peculiarity of our version of 
IFPL. 

To distinguish the gains of the FPL and IFPL algorithms we denote s{ a 
one-step gain of the FPL algorithm at step t and s/ is a one-step gain of the 
IFPL algorithm. The expected one-step gains of the FPL and IFPL algorithms 
at the step t are denoted l t = E t (s() and r t = E t (s^). 

Theorem 3. For any fi, < /i < 1, the expected one-step gain It of the FPL 
algorithm with learning rate ( 7) and the expected one-step gain r t of the IFPL 
algorithm with learning rate (12) satisfy the inequalities 

h > e~^r t (13) 

for all t. 

//Dcv(G) < \\ib~ for some < S < 1 then 

h > (1 - S)r t (14) 

holds for all sufficiently large t. 

Proof. For any t > 0, denote £ x = , £ 2 = £ 2 and consider two random variables 
\ 2 otherwise 

and 

he > 4, + he 



\ 2 otherwise 

Recall that v t — max{sj. f ,4t} for all t. For any real number r we compare 
conditional probabilities P{I = 1|£ 2 = r} with P{J = 1|£ 2 = r} and P{7 = 
2|£ 2 = r} with P{ J = 2|£ 2 = r}. 

In our analysis, the nontrivial cases are s\. t — s\. t _ l + s t and s\. t — s\. t _ 1 
or s\. t — s\. t _ 1 + s t and s\. t = s\. t _ 1 , where s t > (we indicate these cases in 
(16)-(19) below by ±). In this case the following chain of equalities is valid: 

P{I = l\e =r} = P{s\ :t _ x + —e > sl t _ x + — r\e = r} = 

P{e>e t - 1 {sl t _ 1 -sl t _ 1 )+r\e=r} = 
Pie > et(*L-i - 4t-i) + (et-i - ^)(4 t -i ~ 4t-i) + ^1^ = ^ = (15) 
e -(e t _ 1 -e t )(.? !t _ 1 -.i !t _ 1 )p {€ i > -i-( a ? :t _ 1 - s^) + r \e = r} = (16) 

fJ-Vt 



p{e > jk^- 1 ±st ~ s}: *- 1} + rle = r} = (17) 
ik ( 7t -;_ 1 - i ±i) p[e > i {slt _ ait) + r]e = r}= (18) 

Here we have used twice, in (15)-(16) and in (16)-(17), the equality P{£ > 
a + b] = e~ fc P{£ > a} for any random variable £ distributed according to the 
exponential law; we also used the equality v t — Vt-i — 7*st, where < -ft < 1, 
in (16). The exponent (19) is bounded 

s 1 - s 2 

2> lt ^±zl l±±±i>-2. (20) 

v t -i 

These bounds follow from the inequalities s\. t _ 1 /v t -i < 1 and s\. t _ 1 > for all 
t and for i = 1,2. We also used the inequality s t /v t < 1 for all t. Therefore, 

e^P{J = i|f2 = r } > P {j =l\£ 2 = r }> e"tp{J = 1|£ 2 = r}. (21) 

Since, the the inequality (21) holds for all r, it also holds unconditionally 

eip{J= 1} > P{I = 1} > e-ip{J= 1}. (22) 

Analogously, we obtain 

eMP{J = 2}> J P{/ = 2}> e -M j p{J = 2} (23) 

for all t = 1,2, . ... 

If Dcv(G) < |^(5, when for sufficiently large t the exponent (19) is bounded 
from below by 

_ 2. ik _s 
e f v * > e > 1 — S. 

From this (14) follows. 

From (22) and (23) we obtain the lower bound (13) 

k = E{s{) = s\P{I = 1) + s 2 t P(I = 2) > 
s\e-iP(J = 1) + s 2 e~iP(J = 2) = e-^E(s J ) = e~ir t . (24) 

A 

The connection between expected cumulative gain of the IFPL algorithm 

T 

ri:T = ^2 r t 



and expected cumulative gain of the FPL algorithm 

T 



H:T = 

t=l 

is given in the following corollary. 

Corollary 1. for any ^ and r\, < /U, 77 < 1, £/ie expected cumulative gains of 
the IFPL and FPL algorithms with parameters defined in Theorem 3 satisfy the 
following inequalities 

h-.T > e " ri : T (25) 

for all T. 

If Dcv(G) < \n5 for some < 5 < 1 fften 

il:T > (1 - 5)ri:T 

holds for all sufficiently large T. 

The second bound also holds for unbounded one-step gain games and so, it is 
some improvement of results of [7] and [5] . 

The following theorem, which is an analogue of the result from [7], gives a 
bound for the IFPL algorithm 

Theorem 4. The expected cumulative gain of the IFPL algorithm with the learn- 
ing rate (12) is bounded by 

ti-.t > maxs']. T (26) 

i=i,2 ' ex 

for all T. 

The proof is along the line of the proof from [5] (which is a refinement of the 
proof from [7]). 

Let in this proof s t = (sj,Sf) be a vector of one step gains and s 1:t = 
(s\. t , s\. t ) be a vector of cumulative gains of two experts, also let £ be a vector 
whose coordinates are random variables Q and Define 6q = 00 and §i : t = 

si-.t + —£t for t = 1,2,.... Consider the one-step gains s t = s t + £t ( — — ) 

for the moment. For any vector s and a unit vector d denote 

M(s) = argmax deD {d o s}, 

where D = {(0, 1) T , (1, 0) T } is the set of two unit vectors of dimension 2 and o 
is the inner product of two vectors. 
We first show that 

T 

^M(si :t ) OS, > M(§1:t) °Sl:T- (27) 
t=l 



For T = 1 this is obvious. For the induction step from T — 1 to T we need to 
show that 

M{s\:t) O §T > M(s\;t) O Sl;T — M(Sl:T-l) ° Sl:T-l- 

This follows from si-t — Si:T-i + st and M(si ; t) o §i : t-i < M(s\-t-\) ° 5i : t-i- 
We rewrite (27) as follows 

T T (\ 1 \ 

53 M(S 1:t ) o St > M(s 1:T ) o g 1:T - ^ M («i :t ) o & I - - - — ) . (28) 

t=i t=i 
By the definition of M we have 

M(si :T ) O Sl:T > M ^Sl :T + — ^ O ^S 1:T + — 

max{dos 1 . T } + M| si.T + — ] o— . (29) 
The expectation of the last term in (29) is equal to We have also 

T 

£(-- — ) M (^)°&< 

^ Ve* ct-i/ 
r 

W t )o&. ( 3 °) 

t=i V £t e *-i/ 

We have P{max£ > 2/} < ^U 1 > y} + P{£ 2 > y} = 2e" y . Since 

/>OG 

E(Mte)o^) = J B(maxU 1 ^ 2 })< / 2 e -»dy = 2, 

Jo 

the expectation of (30) has upper bound Combining the bounds (28)-(30) 
we obtain (26). A. 

Corollary 2. Let \i, < /U < 1, 6e given. If the game of two experts is non- 
degenerative then the expected cumulative gain of the IFPL algorithm is bounded 
by 

T\:T > ma XSi : T(l — [j)- 



4 Zero sum games 

We consider a simplest example of the game of prediction with expert advice 
with arbitrary positive and negative one-step gains and losses. We apply these 
results in Appendix A. 



We consider a game G of two experts with zero sum, i.e., s\ = —si at each 
step t of the game. If a one-step gain is negative it is called a loss. There are no 
restrictions on the absolute values of s\ . Define a volume of the game at step t 

1=1 

A game with zero sum is called non-degenerate if lim V t = oo. Analogously to 
(9) we consider the deviation of the game G with zero sum 

Dev(G) = lim sup 

where St = \s]\ and Vt is the volume of the game. 

Evidently, the expected cumulative gain of the algorithm which chooses one 
of two experts with probability \ equals zero. 

The following proposition is an analogue of Proposition 1. 

Proposition 2. For any probabilistic algorithm of following the best expert, two 
experts exist such that the expected cumulative gain of this algorithm E(s-\_-t) < 
and two experts exist such that E(si-t) > for all t. 

Proof. If P{I = 1} > \ define s\ = 1, s\ = -1 and define s\ = -1, s 2 t = 
otherwise. The following estimates are analogous to that given in the proof of 
Proposition 1. A 

The following theorem which is an analogue of the Theorem 1 for games 
with zero sum shows that if a probabilistic algorithm of the following the leader 
has high performance in games with bounded one-step gains then its expected 
cumulative gain in some games with unbounded one-step expert gains can be 
arbitrary negative. 

Theorem 5. Let L t be any sequence of positive real numbers, t = 1,2, . ... Let 
(5, 5' be arbitrary close and arbitrary small positive real numbers such that 5 > 5' , 
and let for any two experts with bounded one-step gains St, i.e. such that < 
St < 1 f or all t, the expected cumulative gain of the master algorithm has the 
lower bound 

E(a 1:t )>(l-5)\a\ :t \ (31) 

for all sufficiently large t. Then there exist two non-degenerate experts with un- 
bounded one-step gains such that the expected performance of the master algo- 
rithm is bounded from above 

E(s 1:t )<2S'\sl t \-(l-2S')Vt (32) 

and such that V t > L t for infinitely many t, where V t is the volume of the game. 



Proof. The proof is similar to the proof of Theorem 1. It uses a modified 
version of Lemma 1 which is also valid for negative gains with some evident 
modifications. 2 

Let a master algorithm be given. We define two experts with unbounded 
one-step gains as follows. Define s{ = s 2 = 0. By modified version of Lemma 1 
a number s 1 exists such that s 1 > and P{I = 1} > 1 — 8, where P{I = 1} = 
f( S \s r ). 

Let t be even, and let s\. t _ l and s\. t _ l — — sl. t _ 1 be defined on previous 
steps. We will use the induction hypothesis: s{. t _ 1 > and P{I = 1} > 1 — 8, 
where P{I=l} = f(si t _ 1 ,sl t _ 1 ). 

Define one-step gains of experts 1 and 2: s\ — —M t and s 2 = M t , where 

f \E(s 1:t ^)\ V t -x 

Mt — max < — — ——,L f , — — 

\ 2(6-6') ' *' S 

Let t be odd. By modified version of Lemma 1 a number s 1 exists such 
that s 1 > |sl :t _!| and P{I = 1} > 1 - 6, where P{I = 1} = f(s 1 ,-s 1 ). 
Define sj = s 1 — s\. t _ 1 , then s{. t — s 1 , and s\ = —sj. Evidently, the induction 
hypothesis is valid after odd step t. 

Let us prove that this construction is correct. Let t be even. Then by the 
induction hypothesis s\. t _ 1 > and P{I = 1} > 1 - 6 (and P{I = 2} < 8). By 
definition s\. t = s{. t _ 1 — M t and s\. t — s1. t _ 1 + M t . The expected one-step gain 
of the master algorithm is bounded 

E(s t ) = s\P{I = 1} + s 2 t P{I = 2} < -(1 - S)M t + SM t = -(1 - 28)M t . 

By definition L t < M t < V t = V t -i + M t < (1 + 8)M t . Then 

E(sv.t) < E(a 1:t -i) - (1 - 25)M t < -(1 - 28')M t - 
-M t + 28' M t < -(1 - 8)V t + 28'\s\ :t \ 

for all even steps t. A 

We consider the non- degenerate games, i.e., such that Vt is unbounded. 

To obtain the lower bounds we reduce our zero sum game to a game with 
non- negative one-step gains. Define one-step gain of new experts s\ — s\ + \s\\ 
for i = 1,2. Then s\ > for all t and s\ = or s 2 = for all t. By definition 
s\. t = s\. t + V t for i = 1,2, where V t is the volume of the initial game. Evidently, 
the FPL and IFPL algorithms defined in Section 3 make the same choices for 
experts of both type. 

The expected one-step gains of the master algorithm for for experts of both 
type satisfy It = s\P{I = 1} + sjP{I = 1} + \s t \. This implies the equality 
h-.t — h-.t + Vt for expected cumulative gains. The analogous equalities hold for 
f t , h-.t and r t , r 1:t . 

The following theorem is a corollary of Theorem 2. 



2 A modified version of Lemma 1 looks as follows: Let <5, 5' be positive real numbers 
such that 8' > 8, and let for any two experts with bounded one-step gains (31) holds 
for all sufficiently large t. Then for any number s 1 a number s 1 > exists such that 
s 1 > S 1 and P{I = 1}>1- 8', where P{I = 1} = f(s\ -s 1 ). 



Theorem 6. For any n such that < fi < 1 an FPL algorithm can be specified 
such that for any non-degenerate game of two experts its expected cumulative 
gain at any step T has the lower bound 

h-.T > e~i(l - (i)\si T \ - V T (1 - e~i{l - fx)), (33) 

where s\. T is the cumulative gain of the the first expert and Vt is the volume of 
the game at step T . 

If Dcv(G) < \n5 for some < S < 1 then 

h-.T > (1 - - m)I4tI -V + V)V T (34) 
holds for all sufficiently large T. 

Proof. This theorem follows from Theorem 2 and relations between one-step 
gains §1 and s\, i = 1, 2, of two type of experts. A 

Remark. In case when Dev(G) < jfJ,6, the bound (34) can be improved for 
some t if we replace in Section 3 the learning rate (7) on 

1 

umax sf • 
j<t L - J 

Then the inequality (34) can be obtained directly (without using the modified 
experts) from inequalities (22) and (23). We can prove that for any T such that 
T = arg max \s\. t \ 

Jl:T>(l-5)(l-/i)|*i:T|- 

A Learning the fractional Brownian motion 

In this section we present some example of the zero sum game studied in Sec- 
tion 4. Rogers [8], Delbaen and Schachermayer [2], and Cheredito [3] have con- 
structed arbitrage strategies for a financial market that consists of money market 
account and a stock whose price follows a fractional Brownian motion (for con- 
tinuous time) with drift or an exponential fractional Brownian motion with drift. 
Vovk [9] has reformulated these strategies for discrete time. 

Let So, Si, . . . , S n , . . . be a sequences of prices of some financial instruments 
such as stocks or bonds. We consider the following "financial" game between an 
investor and the market. The investor can use the long and short selling. 
FOR t = 1,2,...T-1 

At the beginning of trading period the investor's cumulative income (or loss) 
earned from the beginning of the game is Si-.t-i — Yll—i s i- 

At the beginning of trading period, observing his past incomes and losses the 
investor determines the number Ct of shares of the stock needed to realize his 
strategy. 



At the end of trading period the market discloses the price S t +i of the stock, 
and the investor incur his current income or loss at the period t 3 



s t = C t (S t+1 -S t ). 

ENDFOR 

Denote AS t = S t +i — S t . We have the following equality 

T-1 T-1 T-1 

(S T - So) 2 = (£ ASt) 2 = £ 2{S t - S )AS t + ]T(ZiS t ) 2 - (35) 

t=0 t=0 t=0 

The equality (35) leads to the two strategies which are represented by two 
experts: At the beginning of step t Expert 1 holds the number of shares 

C\ = 2C(S t - S ), (36) 

Expert 2 holds the number of shares 

C? = -2C(S t - So), (37) 

where C is an arbitrary positive constant. 

These strategies at step t earn the incomes sj = 2C(S t — S )AS t and s 2 — 
— s 1 . The strategy (36) earns in T steps of the game the income s\. T — 2C((St — 

T-1 

So) 2 — X) (ASt) 2 )- The strategy (37) earns in T steps the income s\. T = — s\. T . 
t-\ 

The number of shares C\ = 2C(S t -\ — So) in the strategy (36) or number 
of shares C 2 = — 2C(St-\ — So) in the strategy (37) can be positive or negative. 
Expert 1 uses the hypothesis that the Hurst exponent of the price of stock is 
> 5 (a smoother trend). Expert 2 uses the hypothesis that the Hurst exponent 
is < 5 (volatility is high). 

It is reasonable to dcrandomize the FPL algorithm for this financial game. 
For that, the investor must follow both experts strategies simultaneously holding 
P{I — l}C t 1 + P{7 = 2}C 2 shares of a stock at any step t. In this case Theorem 6 
holds, where the expected gain at step t is replaced on a pure gain s t = P{I = 
l}C}AS t + P{I = 2}C 2 AS t . 4 
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